Contextual Embeddings-Based Web Page Categorization Using the Fine-Tune BERT Model
نویسندگان
چکیده
The World Wide Web has revolutionized the way we live, causing number of web pages to increase exponentially. provides access a tremendous amount information, so it is difficult for internet users locate accurate and useful information on web. In order categorize accurately based queries users, methods categorizing need be developed. text content plays significant role in categorization pages. If word’s position altered within sentence, change interpretation that this phenomenon called polysemy. page categorization, polysemy property causes ambiguity referred as problem. This paper proposes fine-tuned model solve problem, using contextual embeddings created by symmetry multi-head encoder layer Bidirectional Encoder Representations from Transformers (BERT). effectiveness proposed was evaluated benchmark datasets i.e., WebKB DMOZ. Furthermore, experiment series also model’s hyperparameters achieve 96.00% 84.00% F1-Scores, respectively, demonstrating importance compared baseline approaches machine learning deep learning.
منابع مشابه
Web Page Categorization Using Artificial Neural Networks
Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web ...
متن کاملContextual object categorization with energy-based model
Object categorization is a hot issue of an image mining. Contextual information between objects is one of the important semantic knowledge of an image. However, the previous researches for an object categorization have not made full use of the contextual information, especially the spatial relations between objects. In addition, the object categorization methods, which generally use the probabi...
متن کاملWeb Page Categorization using Multilayer Perceptron with Reduced Features
The web is a huge repository of knowledge and numerous hyperlinks. Web also serves a broad diversity of user communities and global information service centers. Every day the knowledge in web page upwards rapidly. Web pages can be used to convey the knowledge to web users. Such voluminous size of the web makes an intricacy of web information retrieval, web content filtering and web structure mi...
متن کاملDISTRIBUTED APPROACH to WEB PAGE CATEGORIZATION USING MAP- REDUCE PROGRAMMING MODEL
The web is a large repository of information and to facilitate the search and retrieval of pages from it, categorization of web documents is essential. An effective means to handle the complexity of information retrieval from the internet is through automatic classification of web pages. Although lots of automatic classification algorithms and systems have been presented, most of the existing a...
متن کاملA Novel Web Page Categorization Algorithm Based on Block Propagation Using Query-Log Information
Most existing web page classification algorithms, including contentbased, link-based, or query-log analysis methods, treat the pages as smallest units. However, web pages usually contain some noisy or biased information which could affect the performance of classification. In this paper, we propose a Block Propagation Categorization (BPC) algorithm which deep mines web structure and views block...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Symmetry
سال: 2023
ISSN: ['0865-4824', '2226-1877']
DOI: https://doi.org/10.3390/sym15020395